Chapter 3. The Real Root Cause of the TMI Accident
Since the time of the TMI accident virtually hundreds of people have stuck their nose into the root cause of the TMI accident. Both the Kemeny and Rogovin investigations identified a lot of programmatic “stuff” that needed to be fixed, and I agree with most of it. However I feel both of them skirted one important issue by using different flavors of “weasel words” in the discussion of Operator Error. The two reports handled that specific topic a bit differently, but the discussions got couched with side topics of contributing factors. But the general consensus of all the current discussion summaries I read is TMI was caused by Operator Error. The TMI Operators did make some Operator Errors and I am not denying that. But my contention is all the errors they made were after the fact that they got outside of the Design Basis understanding of PWRs at that time. It is no surprise to anyone that when a machine this complicated gets outside of its design basis anything might happen. You basically hope for the best, but you are going to have to take what you get. Fukushima proves that, and everyone knows why/how Fukushima got outside of their design basis. The how/why TMI Operators got outside of their design basis is going to be the focus of my discussion. I will also discuss the fact that I think this was understood at the time of the investigations, but it was consciously decided not to pursue it.
My whole point of contention is the turning off the High Pressure Injection flow early in the event in response to the increasing Pressurizer level is the crux of the whole Operator Error argument. All discussions say if the Operators hadn’t done that, the TMI event would have been a no-never-mind. And I agree. But nobody really wants to believe they were told to do that for the symptoms they saw. In other words they were told to do that, by their training, compounded by tunnel vision bad procedure guidance. I have believed this since the day I understood what happened at TMI. Furthermore the TMI Operators were trying to defend their actions from a position of weakness; their core was melted, nobody wanted to believe them. I am not in a position of weakness on this issue, my event came out OK at DBNPP, and so I have no reason to not be totally honest or objective on this issue. During the precursor event at DBNPP we also turned off High Pressure Injection early in the event in response to the symptoms we saw, and for the same reason the TMI Operators did it eighteen months later; we were told to do it that way. This fact is apparently a hard pill to swallow. But if it is hard for you to accept, just imagine how I felt watching TMI unfold in real time.
And right there is the crux of the issue. Once those High Pressure Injection pumps were off, both plants were then outside the Design Basis understanding for that particular Small Break Loss Of Coolant Accident (SBLOCA). So you hope for the best, but take what you get. But still obviously an error has been made if not taking that action would have made the event a no-never-mind. So who exactly made the error? Both the Kemeny and Rogovin Reports discuss the problems with the B&W Simulator training for the Operators. The important point they both apparently missed (or didn’t want to deal with, which I prefer as the explanation) is that this is really an independent two-part problem. I will refer to controlling High Pressure Injection during a SBLOCA as part A of the problem, and to the actual physical PWR plant response to a SBLOCA during a leak in the Pressurizer steam space as part B of the problem. It really is that simple. B&W was training correctly for High Pressure Injection control (part A) for SBLOCAs in the water space of their PWR. But they (nor Westinghouse) correctly understood the correct plant response for a SBLOCA in the Pressurizer steam space. So by omission they were not training correctly for a SBLOCA in the Pressurizer steam space (part B). To make matters worse B&W was overstressing in training the importance of the part A “rules”, to the extent an Operator would fail a B&W administered Operator Certification Exam for failure to correctly implement the part A rules. Thus when fate would have it and the two occurrences, part A and part B, combined in the real world, where the plant responds per the rules of Mother Nature, the B&W training and procedures ended up leading the Operators to actions that put them outside the actual Design Basis, not the falsely perceived (and trained upon) Design Basis.
Up until very recently my argument has been one using just simple logic and sheer numbers of Operators involved. In the DBNPP September ’77 event there were 5 licensed Operators in involved in that decision, either by direct action or complacent compliance. In other words all 5 agreed it was the right thing to do. Of course it wasn’t the right thing to do, but nobody objected because it was the correct part A thing to do and nobody yet understood the part B of the problem. Eighteen months later at TMI, March ‘79 an additional number of Operators (just how many depends on the time line) repeated the same initial wrong actions. So we have about a dozen Operators, at 2 independent plants eighteen months apart all doing the same thing, and all convinced they were doing the right thing. Is it even conceivable to think they did not all believe they did the right thing according to part A? I just don’t believe so; of course we are all arguing from a position of weakness. It is the wrong thing to do for part A and part B combined, so nobody really wants to believe we were trained to do it. But as I explained in the preceding paragraph it is really the two-part problem that created the issue. My point can be further emphasized by the fact that NRC Region III had heartburn over the report DBNPP submitted for our event. They did not like the fact that the report did not say the Operators made an error turning off High Pressure Injection. I know why that happened. The person most responsible for writing the report narrative was actually in the Control Room during the event. He did not believe the action was an error based on his same training relative to part A of the problem. So why would he put that statement in the report? He was so convinced his own (complacent) agreement was correct, that saying otherwise would be a false statement.
Just recently new information came to my attention that absolutely confirms my belief that B&W was in fact totally emphasizing High Pressure Injection control in their training based solely on their understanding of the part A problem, with no understanding on B&W’s part of the part B problem or its affect when combined with the part A problem. My understanding comes directly from seeing the whole infamous Walters’ response memo of November 10, 1977 to the original Kelly memo of November 1, 1977. It is absolutely remarkable to me that 35 plus years after the DBNPP event and almost the same amount of time after TMI that a totally unrelated Google search turns up the total Walters memo. After half a life time of studying all the TMI reports I had only seen one “cherry picked” excerpt from the Walters’ memo, basically saying he agreed with the Operator’s response at DBNPP. The whole memo in context basically confirms that the Operator claims of “we were trained to do it” are correct. The original Kelly memo also confirms that Kelly still didn’t grasp the significance of the part B problem, as related to the DBNPP event; or if he did he didn’t relate it thoroughly and clearly in his memo so it was understood by the folks who could initiate corrective action. Both memos are presented and discussed below; make up your own conclusions. The source is here: Source
The Kelly Memo
THE BABCOCK & WILCOX COMPANY
POWER GENERATION GROUP
To Distribution
From J.J. Kelly, Plant Integration
Cust . Generic Date November 1, 1977
Subj. Customer Guidance on High Pressure Injection Operation
DISTRIBUTION
B.A.Karrasch
E.W.Swanson
R.J. Finnin
B.M. Dunn
D.W. LaBelle
N.S. Elliott
D.F. Hallman
Two recent events at the Toledo site have pointed out that perhaps we
are not giving our customers enough guidance on the operation of the
high pressure injection system. On September 24, 1977, after
depressurizing due to a stuck open electromatic relief valve, high
pressure injection was automatically initiated. The operator stopped
High Pressure Injection when pressurizer level began to recover, without regard to primary
pressure. As a result, the transient continued on with boiling in the
RCS, etc. In a similar occurrence on October 23, 1977, the operator
bypassed high pressure injection to prevent initiation, even though
reactor coolant system pressure went below the actuation point.
Since there are accidents which require the continuous operation of the
high pressure injection system, I wonder what guidance, if any, we
should be giving to our customers on when they can safely shut the
system down following an accident? I recommend the following guidelines
be sent;
a) Do not bypass or otherwise prevent the actuation of high/low
pressure injection under any conditions except a normal,
controlled plant shutdown.
b) Once high/low pressure injection is initiated, do not stop it
unless: Tave is stable or decreasing and pressurizer level is
increasing and primary pressure is at least 1600 PSIG and
increasing.
I would appreciate your thoughts on this subject.
JJK: jl
The referenced source document is basically a critique of these memos done by textual communications experts discussing corporate communications problems. Here’s a summary. First, Kelly is talking “uphill” in the organization, so he couches his memo with that in mind. He asks no one for a decision, but basically asks for “thoughts”. And he makes a non-emphatic recommendation for “guidelines.” My personal additional notations are he dilutes the importance of and possibly adds confusion to the recommendation by adding Low Pressure Injection, “LPI”, to the discussion, but most importantly he totally misses any part B problem discussion. He does say “the operator stopped High Pressure Injection when Pressurizer level began to recover, without regard to primary pressure.” But there is no mention about the fact that the system response was not as expected, e.g. the pressurizer level went up drastically in response to the RCS boiling. He never articulates that the Operator’s reluctance to re-initiate High Pressure Injection, even after we understood the cause of the off-scale Pressurizer level indication, was based solely on that indicated Pressurizer level and our training. Thus the memo totally misses addressing the part B problem point that the system response was not as expected by anybody, which was crucial to getting the guidance fixed.
The other thing I notice is the memo is not addressed to Walters. I’ve also “been there, done that” in a large organization. I can easily understand how the recipient (Walters’ boss) upon receiving this memo, with no specific articulation of a new problem (part B), would pass it to Walters with a “handle it, handle it… make it go away.” I also note the N.S. Elliott on the distribution. He was the B&W Training Department Manager, thus B&W training was directly in the loop on this issue also.
The Walters Response Memo
Note the original Walters’ response memo to Kelly was hand written, so it has been apparently typed someplace along the line. This is how it appears in the reference source, typos and all.
MEMORANDUM THE BABCOCK & WILCOX COMPANY
To .i J Kelly, Plant i n t e g r a t i o n
From J .F. Walters, Nuciear Service
Cust. TOLEDO Date November 10, 1977
Sub.. High Pressure Tnjection during transient:
Ref. Your l e t t e r t o DISTRIBUTION; Same Subject
Dated NOV 1, 1977.
In talking with training personnel and in the opinion of this
writer the operators at Toledo responded in the correct manner
considering how they have been trained and the reasons behind this
trainin
My assumption and the training assumes first that RC Pressure and
Pressurizer Level will trend in the same direction under a LOCA. For a
small leak they keep the HP system on up to a certain flow to maintain
Pressure Level.
In the particular case at Toledo, there was no LOCA of magnitude
and with the small leak the inventory in the system came back as
expected but due to the recovery of the RCS the RCS pressure cannot
respond any quicker than the pressurizer heaters car, heat the cold water
now pushed back into the pressurizer. Leaving the H.P.I. system on
after Pressurizing Level indicator is listed high, will result in the RCS
pressure increasing and essentially hydroing rhe RCS when it becomes
solid. If this is the intent of your letter arid the thoughts behind. it.
then the operators are not taught to hydro the RCS everytine rhe HPT
pump is initiated.
If you intend to go solid what about problems with vessel
mechanics. Also will the code and electromagnetic valves relief water
(via steam) at significant flow rate to keep the RCS from being hydroed.
cc. R.J. FINNIN
I’m omitting the communications expert’s comments, they are in the reference. Here are my comments. First in simple Operator Lingo this response is a “smart ass slap down” to Kelly, including all the accompanying sarcasm. But there are some very important admissions revealed here. First, an admission, including Walters’ discussion with the B&W Training Department, that we responded in the correct manor considering how we were trained, and also including the bases behind our training. This is what we Operators had been claiming all along, but nobody wanted to believe it. Second, Walters clearly states both as his personal assumption and the B&W Training Department assumption that RC pressure and Pressurizer level will trend in the same direction during a LOCA. Bingo. He has just admitted they still don’t "get it", the specific part B contribution to the problem. So they are in fact training wrong for this event because they don’t understand part B. Further this discussion is happening after the DBNPP event, as a result of the Kelly concerns, and well before TMI. Third, the tone of Walters’ sarcastic comments about a “hydro” (hydrostatic pressure testing) of the RCS every time High Pressure Injection is initiated shows the disproportional emphasis that the B&W training was placing on “never let High Pressure Injection pump you solid.” Again something the Operators were claiming that nobody wanted to believe.
My conclusion, and it hasn’t changed in 35 years, is that the Root Cause of the TMI accident was the very specific warning the DBNPP event provided, along with the three other general precursor event warnings were missed; by the PWR Industry in general and the AEC (NRC) specifically. The contributing factor is the B&W simulator training and inadequate procedures put the TMI Operators in a box, outside of their Design Basis understanding for that specific SBLOCA. The cause of that is B&W itself didn’t understand the actual plant response to that SBLOCA event because it was never analyzed correctly.
For a long time I wondered why both the Kemeny and Rogovin investigations didn’t reach the same specific conclusion as I have. After all, both investigations had some very smart people involved in both processes, and they both looked at the same evidence. My thinking today is that they did reach that same conclusion. But I don’t actually know what they may have seen as the bottom line purpose for their investigations either. If you consider that no investigation report was going to change the condition of TMI, it may have been as simple as there is enough wrong that needs fundamental changing, so let’s just get those changes done and move forward. So neither group saw a need to identify the actual bottom line root cause, rather they just gave recommendations for prevention of another TMI type accident. Further, by the time those 2 reports were published, it was well understood there was going to be a law suit between GPU and B&W. If one of those reports had specifically identified B&W with partial liability for the root cause, that conclusion along with the report that made it, would be inherently dragged into the law suit. I have no doubt this was actually discussed at the time. And I will further speculate it was actually decided there was no reason to identify the actual true single root cause in the reports because the law suit itself would decide that liability issue independently of the reports. That is exactly what law suits do! My problem with that is the law suit, which started in ’82, never really settled the liability issue as it was mutually “settled” in ’83 before a conclusion was reached. And I will add that settlement was financially beneficial to all parties concerned. B&W made money, GPU saved money, and both law firms had made money.
Another thing I think was actually discussed at that time was the fact that if the reports stated the root cause was because the B&W training put the Operators outside of the Design Basis understanding for that event; because the event wasn’t understood by B&W, it would open Pandora’s Box. They didn’t want to deal with “What else do you have wrong?”, and there was well over a hundred billion dollars worth of these NPPs still operating. This conclusion is strongly reinforced for me by the Kemeny Report section “Causes of the Accident”. This section of the report lists a “fundamental cause” as Operator Error, and specifically lists turning off High Pressure Injection early in the event. And then the report lists several “Contributing Factors” including B&W missing the warning provided by the DBNPP event. If you read the list of contributing factors listed in the Kemeny Report there is a screaming omission; it is never stated B&W (actually the whole PWR industry if you consider the precursors) did not understand the actual plant response to a leak in the Pressurizer steam space (what I refer here as part B of the problem). And that is why B&W and the NRC both missed the DBNPP warning. Virtually nothing will ever convince me that all those smart people did not put that truth together.
Thus it was both their fear of opening Pandora’s Box, and a conscious decision that there was no need to implicate B&W with any partial liability ruled the process. By doing that they collectively decided to throw the TMI Operators under the bus as the default position. My conclusion for the missing Contributing Factor problem is an Occam’s razor solution; it is not “missing” at all with respect to they didn’t “Get It”; it was a conscious decision not to include it. After all, if that Contributing Factor had been included, who on earth will believe it is an Operator Error when they simply did what they were told to do in that situation? So they just simply did not want to deal with the real issue; who made the error?
A Mysterious News Paper Article
From the New York Times
January 25, 1983
By DAVID BIRD
The operator of the Three Mile Island nuclear plant agreed out of court yesterday to accept $37
million in settling its suit against the manufacturer of the disabled reactor.
The operator, the General Public Utilities Corporation, charged that it had suffered $4 billion in
damages on March 28, 1979, in the worst accident in the history of commercial nuclear power. The
settlement was reached after a trial in the suit had gone on for almost three months.
The companies were reported eager to bring the trial to an end because further disclosures could
damage the future of the nuclear power industry in which both parties had a large stake.
I'm going to assume Mr. Bird didn't make this up, rather someone he talked with for this story relayed this "eager" information to him. And it just seems to beg a question, just what "damaging" information could have been found in the discovery search for this law suit? Does it mean something even more damaging to the future of nuclear power industry than Roger Mattson's blunder about the potential for a hydrogen bubble explosion? Something additional not found by the Kemeny and Rogovin investigations? Something seriously unknown inside Pandora's Box?
Or is this a simple editorial mistake, like he meant "embarassing" not damaging? I think being embarassed was the least of the outcome the TMI Operators have had to deal with.
A Simple Analogy
For years I struggled with finding a simple analogy to explain the position the TMI Operators were placed in by their training. One that could be understood by common everyday knowledge everyone was familiar with; not the technical detail that required understanding the complications of nuke plant operations. One of the reasons that was difficult was that it required a “phenomena” that is commonly understood today, but was not understood at all at the time of the training. This is the best I can come up with.
Suppose in learning to drive a car you are being trained to respond to the car veering to the left. It’s simple enough, simply turn the steering wheel to the right to recover. It is also what your basic instinct would lead you to do, so there is no mental conflict in believing it. It is also actually reinforced and practiced during actual driver training on a curvy road. That response is soon imbedded as the right thing to do. Now suppose your driver training also includes training on a Car Simulator training machine. It is where you learn and practice emergency situation driving. After all, nobody is going to do those emergency things in an actual car on the road.
Here’s where it gets complicated. Assume virtually no one yet understands that when the car skids to the left on ice (because of loss of front wheel steering traction); the correct response is to turn the steering wheel into the skid direction, or to the left. This is just the opposite of the non-ice response. And to make matters worse, because no one understands that yet, including the guy who built the Car Simulator, the Car Simulator has been programmed to make this wrong response work correctly on the Simulator. So in your emergency driver training you practice it this way, the Simulator responds wrong to the actual phenomena, but it shows the successful result, you recover control by doing the actual wrong thing. Since this probably also agrees with your instinct, and you see success on the Simulator, this action is also embedded as the right thing to do. One additional point, if you don’t do this wrong action, you will flunk your Simulator driver training test.
So you know where this is going, now you are out driving on an icy road for the first time and the car skids to the left. You respond exactly as you were instructed to do and exactly as the Simulator showed was successful, and you have an accident because the car responds to the real world rules of Mother Nature. An investigation is obviously necessary because, I forgot to tell you, the car cost $4 billion dollars and you don’t own it. During the subsequent investigation everything is uncovered; the unknown phenomenon is finally correctly understood, the Simulator incorrect programming is discovered, it is uncovered the previously unknown phenomenon had been discovered before your accident, and your accident was even predicted as possible. But the investigation results are published and the finding is the accident was caused by your error of turning the steering wheel the wrong way on the ice. Nobody else is found to have made an error in the stated conclusions but you; it is simply a case of Driver Error. Would you feel like you had been screwed? This happened to the TMI Operators.
For everybody out there who doesn’t like my conclusions, I’ll just say that many of the principles of the investigations are still alive, but choose not to talk, so simply ask them. Especially the principles in the GPU vs. B&W law suit which should have determined any liability issues. Ask them why it didn’t happen. My idea of justice involves getting the truth, the whole truth, and nothing but the truth exposed. That process is still unfinished.
Various Errata From the Kemeny Commission Report
• George Kunder statement in Kemeny Commission Report, page 103.
George Kunder, superintendent of technical support at TMI-2, arrived at the Island about 4:45 a.m., summoned by telephone. Kunder was duty officer that day, and he had been told TMI-2 had had a turbine trip and reactor scram. What he found upon his arrival was not what he expected.
"I felt we were experiencing a very unusual situation, because I had never seen pressurizer level go high and peg in the high range, and at the same time, pressure being low," he told the Commission. "They have always performed consistently."
Kunder's view was shared by the control room crew. They later described the accident as a combination of events they had never experienced, either in operating the plant or in their training simulations.
-------------------------------------------------------------------------------------------------------------------------------------
Hi George, welcome to the club; a small group of similarly B&W trained Licensed Operators (about a dozen… at two different plants… independently eighteen months apart) who all independently agreed based on our training, that turning off the High Pressure Injection based on the indicated Pressurizer level was the correct thing to do. So we all made the same “Operator Error.” I’m really sorry about your sense of an “Unusual Situation” but I can clear that up in about a five minute conversation; something the rest of them couldn’t manage in eighteen months. It’s so pathetically simple that any licensed PWR Operator will easily get it; but it required simply looking at the event in real terms not fantasy terms. So… do you have a time machine? We need to talk, you pick the date.
-------------------------------------------------------------------------------------------------------------------------------------
• Summary conclusions in the Rogovin Report by the Essex Corporation, the Human Factors Engineering experts hired by NRC during the Rogovin Investigation to look at the TMI Operator training.
- Operators were exposed to training material but they certainly were not trained.
- They were exposed to simulators for the purpose of developing plant operation skills, but they were not skilled in the important skill areas of diagnosing, hypothesis formation, and control technique.
- They were deluged with detail yet they did not understand what was happening.
- The accident at TMI-2 on the 28th of March 1979 reflects a training disaster.
- The overall problem with the TMI training is the same problem with information display in the TMI-2 control room application of an approach which inundates the operator with information and requires him to expend the effort to determine what is meaningful.
--------------------------------------------------------------------------------------------------------------------------------
Well… at least somebody “got it.” Too bad they were never asked to identify just who had made an “error.”
--------------------------------------------------------------------------------------------------------------------------------
• From Kemeny Report.
In the weeks following the accident, NRC apparently (apparently?) was confused as to what emergency procedures plant operators should follow. (I don’t understand the source of this confusion as the legal requirement is for the Operators to follow the approved Plant EOPs, not the written or verbal instructions of the NRC) Thus, within a short span of time, NRC issued and then either modified or contradicted its post-TMI emergency instructions.
- Immediately after the TMI accident, NRC directed operators not to override automatic engineered safety features under any circumstances and to operate high pressure injection without regard for reactor vessel pressure/temperature limits. (And we actually have to pay for this type of advice?)
- NRC modified this directive within a short time. (Whew…)
- On April 5, NRC required all licensees operating B&W-designed reactors to revise their procedures so that in the event of High Pressure Injection initiation with reactor coolant pumps (RCP) operating; at least two Reactor Coolant Pumps would remain operating. (Maybe I missed something… did they already pass a law forbidding loss of offsite power?)
- On July 26, NRC took the opposite position and directed licensees to shut down its pumps when High Pressure Injection initiated. (Oh? Did the “only after two-minutes” rule come later?)
- I&E, in its August 1979 report on the TMI accident, stated that the failure of the TMI operators to shut down the Reactor Coolant Pumps sooner than they did was a potential item of noncompliance.
--------------------------------------------------------------------------------------------------------------------------------
The above information would be pathetically amusing if it were not true. Unfortunately this is a snapshot in time, “in the weeks following the accident”, which has now expanded to “in the decades following the accident.” And it is only talking about one small issue in the grand scheme of Nuke Power; Reactor Coolant Pumps and High Pressure Injection. (Further discussion about this topic is beyond the scope of this document). This form of “regulation by hysteria” became particularly annoying to me, as by that time my job was involved with revising the Plant Emergency Operating Procedures. I remember on one occasion I even refused to make a change. I only include this information to point out a double standard in the use of the word “error”; there is obviously erroneous guidance in the above information.
But does the Kemeny Report use the word “error” to describe any of it? It is my opinion that in the Nuke World only an Operator can make an error. There apparently is no word in the English language to describe such a thing for a non-Operator. And for me the very ultimate example of this phenomenon is the single most powerful error that literally terrorized our nation after TMI, and is likely the error that caused a permanent loss of public confidence in commercial nuclear power. It occurred when Walter Cronkite went on live national TV and reported the NRC was saying “… melted reactor + possibility of explosion…" What the public heard was… A Nuke Bomb. There are literally volumes of written material and taped discussions about this item available in public documents. See if you can find a single reference that states NRC’s Roger Mattson, the erroneous source of the Hydrogen bubble explosion fiasco, made an error. And then compare that to the number of times you’ve read Operator Error melted the TMI core.
A Message
To the Operators at TMI2, I’ve never met any of you guys, but I walked in your shoes, and none of us can change history. I tried to help in 1982 at the B&W vs. GPU law suit trial, but my testimony on what we were trained to do on the B&W Simulator was not allowed, because it was determined to be non-relevant. I refer to this as a “trick play by the defense and a bad call by the ref.” I’d be willing to bet Captain Chesley Sullenberger, who ditched his commercial airliner in the Hudson River after total engine failure wouldn’t agree with that call. In fact I’d suspect if asked what single thing he thought contributed to his successful outcome of that event, he’d say his Flight Simulator Training. After all, it’s not as if he’d been given an actual plane to actually practice the event on the river. Well…duh… if the simulator training is not relevant, exactly what is? But what if the simulator training had been wrong? I understand turning off High Pressure Injection early in your event is not the “total” of that event, but I know why you did it; it was the same reason I did it, we were told to do it. You guys got thrown under the bus because of the Institutionalized Arrogance and Cognitive Dissonance of the whole Nuclear Industry.